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Abstract. In recent years, methods to estimate the memory parameter using wavelet 
analysis have gained popularity in many areas of science. Despite its widespread use, a 
rigorous semi-parametric asymptotic theory, comparable to the one developed for Fourier 
methods, is still missing. In this contribution, we adapt to the wavelet setting the classical 
semi-parametric framework introduced by Robinson and his co-authors for estimating the 
memory parameter of a (possibly) non-stationary process. Our results apply to a class of 
wavelets with bounded supports, which include but are not limited to Daubechies wavelets. 
We derive an explicit expression of the spectral density of the wavelet coefficients and show 
that it can be approximated, at large scales, by the spectral density of the continuous-time 
wavelet coefficients of fractional Brownian motion. We derive an explicit bound for the 
difference between the spectral densities. As an application, we obtain minimax upper 
bounds for the log-scale regression estimator of the memory parameter for a Gaussian 
process and we derive an explicit expression of its asymptotic variance. 
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1. Introduction 

Let X ( ^ f {X k } k& be a real-valued process, not necessarily stationary and let A A X 
denote its K -tli order difference. The first order difference is [AX]*. = X k — X k _ k and 
A A is defined recursively. The process X is said to have memory parameter d, d £ M (in 
short, is an M(d) process) if for any integer K > d— 1/2, the X-tli order difference process 
A A X is weakly stationary with spectral density function 

/WA)"|l-<CY <K -‘ ) /'W A e (-7r,7r), (1) 

where /* is a non-negative symmetric function which is bounded on (—vr, n) and is bounded 
away from zero in a neighborhood of the origin. M(d) processes encompass both stationary 
and non-stationary processes, depending on the value of the memory parameter d. When 
d < 1/2, the process X is covariance stationary and its spectral density is given by 

/(A) = |l- e - iA r 2 d /*(A) . (2) 


The process X is said to have long-memory if 0 < d < 1/2, short-memory if d — 0 and 
negative memory if d < 0; the process is not invertible if d < —1/2. When d > 1/2, the 
process is non stationary. In this case, the / in 0 i s not inte grab le on [—7r, 7r] and is 
therefore not a spectral density. In the terminology of Yaglorn (119581) . this / is called a 
generalized spectral density. It corresponds to a process X whose increments of sufficiently 
high order are covariance stationary. 

The memory parameter d plays a central role in the definition of M(d) processes and 
is often the foc us of empirical i nteres t. In the parametric case one can use approximate 
MLE methods (JFox and Taaaul ( 19861) 1 or MLE (Dahlhaus ( 19891) ). In the semi-parametric 
case 0 where only a class of functions /* is specified, two types of methods have emerged 
to estimate the memory parameter d : Fourier and Wavelet methods. F requency-do main 
techni q ues are now w e ll documented and understoo d (s ee for instance iHurvich and Rav 


Hurvich et al 


200 




(119951) , lYelascol (119991) , IVelasco and Robinsonl (120001) and 

In this p aper, we f ocus o n wavele t methods and consider the regression estimator in¬ 
troduced in Abrv and Veitchl ([1998 0. which involves estimating d using the slope of the 


regression of the logarithm of the scale spectrum on the scale index. This estimator is now 


widely used in many different fields ( see e .g. IVeitch an d Abrvl (119991) fo r appl ications to 


network traffic; IPer rival and Waldenl j 2000 ) and Papanicolaou and Solna J 20031) for ap pli- 

Bavraktar et al] ( 20041) for 


cations in physical sciences; see e.g. Gencay et al. 


(20ol 


and 
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applications in finance). The regression estimator is well-suited to process large data sets, 
since it has low computational complexity due to the pyramidal algorithm for computing 
the details coefficients. A lso, it is robus t wit h respect to additive polynomial trends (see 
for instance 


Veitch and Abrv (11999 1 and 


Craigmile et al. (12005 11. In Moulines, Roueff and 


Taqqu (1200511 . we study another estimator of d obtained by adapting the local Whittle 
estimator to the wavelet context. 

Despite its widespread use, a rigorous semi-parametric asymptotic theory of the regres¬ 
sion estimator, comparable to the one developed for corr espon din g es timators based on 
the periodogram, is still missing (the concluding remarks in Velasco ( 1999 ) about “the lack 
of rigorous asymptotic theory (...) if the spectral density is not proportional to \~ 2d for 

rere are results in a related 
To the best of our 


all frequencies” for wavelet -based estimates are still valid}. T 
parametric framework (see iBardetl (1200211 and iBavraktar et al. 


(2004 


knowledge, the only attempt in a semi-parametric setting is due to 


Bardet et al. 


( 2000 ). 


The process, however, is supposed to be observed in continuous-time - discretization issues 
were not discussed - and the results do not directly translate to discrete-time observations 
in a semi-parametric framework. The main objective of this paper is to fill this gap. 

The paper is organized as follows. Examples of M(d) processes are given in Section [21 In 
Sectional we introduce wavelets and wavelet transforms for time-series. We do not assume 
that the wavelets are orthonormal nor that they result from a multiresolution analysis. In 
Section 01 we derive an explicit expression for the covariance and spectral density of the 
wavelet coefficients of an M(d) process at a given scale. We extend this result to pairs of 
scales by grouping, in an appropriate way, the wavelet coefficients. The results apply to 
a general class of wavelets with bounded supports, which include but are not limited to 
Daubechies wavelets. If f* belongs to a class of smooth functions 7i(/3, L ) with smoothness 
exponent [3 defined in m, we show that the spectral density of the wavelet coefficients 
of an M(d) process can be approximated, at large scales, by the spectral density of the 
wavelet coefficients of fractional Brownian motion (FBM) and derive an explicit bound for 
the difference between these two quantities. Our result holds not only for d G (1/2, 3/2), 
which corresponds to the standard range for the Hurst index, H = d — 1/2 G (0,1), but 
for all d G R. by interpreting the corresponding FBM as a generalized process with spectral 
density |A|~ 2d , A 6 I. We show that the relative L°° error between the spectral densities 
of the wavelet coefficients decreases exponentially fast to zero with a rate given by the 
smoothness exponent (3 of /*. In Section [21 we consider (possibly non-stationary) Gaussian 
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processes and obtain an explicit expression for the limiting variance of the estimator of d 
based on the regression of the log-scale spectrum. We show that this estimator is rate 
optimal in the minimax sense. Sections EH and Q contain proofs. Appendix Q] involves 
approximations of wavelet filter transfer functions. We derive in Appendix El an inequality 
for the mean and the covariance of the logarithm of quadratic forms of Gaussian variables. 


2. Examples 


Stationarity of the increments is commonly assumed in time-series analysis. In ARIMA 
models, for example, 0 holds with d = K integer and with /* equal to the spectral density 
of an autoregressive moving average short-memory process. If d E M. and /* = a 2 in (J2J), 
one gets the so-called fractionally integrated white noise process, ARFIMA(0,d,0). The 
choice d E M and 

Jl -y* e k e~ iXk \ 2 

/arma(A) = ° - 4, A E (—tt, tt) , (3) 

ii-ELi<^- iAfc r 

with 1 — Y^!k=i 0 k zk 7 ^ 0 for \z\ = 1 and 1 — Ylk= 1 7 ^ 0 ( so that /arma(O) 7 ^ 0 ) leads to 

the class of ARFIMA(p, d, q ) processes. 

Another example is {5#(&)}fcez> a discrete-time version of fractional Brownian motion 
(FBM) {Bn{t),t E M} with Hurst index H E (0,1). The latter is a centered Gaussian 
process with covariance 

R H (t, s) = E [B H (t)B H (s)} = i { \t\ 2H + |s| 2H - \t - s\ 2H } . 


The process {B H (k)} k&z is increment stationary (K = 1) and i ts gener alize d spe ctral 
density is given up to a multiplicative constant (see Samorodnitsky and Taaanl (11994IB by 


OO 

/fbm(A) = |A + 2br\ 2H 1 , A G (— tt, tt) . 

k=—o o 

We can express it in the form ©^ 


/fbm(A) — 11 — e lA | 2 </ /fbm(A) , 


( 4 ) 


by setting d — H + 1/2 E (1/2, 3/2) and 

2 sin(A/2) 2H+1 


/fbm(A) — 


A 


+ |2 sin(A/2)| 2 ' f/+1 |A + 2/c7r| 
Mo 


—2H—1 


( 5 ) 


Observe that /fbm( 0) = 1- an d that if is bounded on (—7r, 7t). 
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The process Gh = AB# is fractional Gaussian noise (FGN). It is a stationary Gaussian 
process with spectral density proportional to (J3J), but with d = H — 1/2 € (—1/2,1/2). 

By convention, throughout the paper, while d may take values in R, H will be always 
restricted to take values in (0,1). 


3. Discrete Wavelet Transform 


In this section, we introduce the main concepts required to define a discrete wavelet 
transform. Denote by L 2 (R) the set of square integrable functions with respect to the 
Lebesgue measure. Let 0 £ L 2 (M) and ip £ L 2 (M) be two functions and define their Fourier 
transforms as 


m = 


<t>(t)e dt and 


m = 


ip(t )e ^ dt. 


Consider the following assumptions: 

(W-l) 0 and ip are compactly-supported, integrable, and 0(0) = f™ (p(t) dt = 1 and 
I^ 2 (t)dt = 1 . 

(W-2) There exists a > 1 such that sup^ eR |0(£)| (1 + |£|)“ < oo. 

(W-3) The function ip has M vanishing moments, i.e. J^° t m ip (t) dt = 0 for all m = 
0,..., M - 1 


(W-4) The function J^fcez k m (p(- — k) is a polynomial of degree m for all m — 0,..., M— 1. 

Assumption |(W-1)| implies that 0 and ip are everywhere infinitely differentiable. The 
exponent a in |(W-2)| is related to the rate of decrease of the Fourier transform ip of the 
wavelet ip and hence to the regularity of ip. Under [(W-l)[ |(W-3)| is equivalent to asserting 
that the first M — 1 derivatives of ip vanish at the origin. This implies, using a Taylor 
expansion, that 


1^01 = 0 ( 1 * 1 ") as 


( 6 ) 


By (Cohen, 2003 . Theorem 2.8.1, Page 90), under [(W-l)[|(W-4)| is equivalent to 


sup |0(£ + 2kir)\ = 0(|£| m ) as f-> 0 . (7) 

k +o 

Adopting the engineering convention that large values of the scale index j correspond to 
coarse scales (low frequencies), we define the family {ipj t k,j £ Z, k £ Z} of translated and 
dilated functions 


ipj,k{t) = 2 3,2 ip{2 3 t — k) . 


(8) 
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Many authors suppose that the ipj^ are orthogonal and even that they are generated by 
a multiresolution analysis (MRA). Assumptions |(W-l)j|(W-4)| are indeed quite standard in 
the context of a multiresolution analysis (in whic h cas e, <p is the scaling function and ip is 
the associated wavelet), see for instance ICohenl (12003 1. In this paper, we do not assume 


that wavelets are orthonormal nor that they are associated to a multiresolution analysis. 
We may therefore work with other convenient choices for cp and ip as long as l(W-l)|(W-4)| 
are satisfied. A simple example is to set, for some positive integer N, 


Hx) = lf^(x) and iP(x) = ^ lf-(z), 

where 1 a is the indicator function of the set A and for a non negative function /, f® N 
denotes the A’-th self-convolution of /. It follows that 


def 


d 


•N 


M£)| = |2sm(£/2)/£r and ^(0 = iCf |2 sin(£/2)/£| 


|2JV 


Using © and 0 , one easily checks that |(W-l)j|(W-4)| are satisfied with M and a equal 
to N. Of course the family of functions {ipj,k} are not orthonormal for this choice of the 
wavelet function ip (and the function cp is not associated to a MRA). Nevertheless, to ease 
references to previously reported works, with a slight abuse in the terminology, we still call 
c p and ip the scaling and the wavelet functions. 

Having defined the functions (p and ip, we now define what we call the Discrete Wavelet 
Transform in discrete time. Start with a real-valued sequence {xk, k e Z}. Using the 
scaling function <p, we first associate to the sequence {xk, k £ Z} the continuous-time 
functions 

n 

x n (t) = f Xk <p(t — k) and x(t) = f Xk (p(t — k), t 6 R . (9) 

k =1 k£jj 

The wavelet coefficients involve {x(t), t e M} and are defined as 

/ OO 

x(t)ip jtk (t) dt j >0,fcG Z. (10) 

-OO 

Without loss of generality we may suppose that the support of the scaling function <p is 
included in (—T, 0) for some integer T > 1. Then x n (t) = x(i) for all t G (0, n — T +1). We 

may also suppose that the support of the wavelet function ip is included in (0,T). Then, 

the support of ipj,k is included in the interval (22 i{h + T)). Hence 

/ OO 

x n (t)ipj, k (t) dt, (11) 

-OO 
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when (2 j k, 2 j (k + T)) C (0, n — T + 1), that is, for all (j, k ) e l n , where 

T n = {(j, k) : j > 0, 0 < k < 2~i(n - T + 1) - T} . (12) 


For any j, the wavelet coefficients {W* k } k ez are obtained by discrete convolution and 
downsampling. More precisely, under [(W-l)[ for all j > 0, k £ Z, © and (HDD imply 

= ^2,xi hjpk-t, = ( hj ,. a x) 2jk = [h jt . a x]) fc , (13) 

i&z 

where a denotes the convolution of discrete sequences, 


hj,i 


let 2 _ j/2 


4>(t + l)ip(2 H) dt , 


and, for any sequence {c k }k£Z and any integer l, Jj is the downsampling operator defined 
as (J/ c)k = c 2 ik • Define, for all j > 0, the discrete Fourier transform of {hj ; i}zez as 

/ OO 

<t>{t + l)e~' lXl ip{2~ j t) dt. (14) 

it* '°° 

Since (p and ip have compact support, the sum in © has a finite number of non-vanishing 
terms and Hj is a trigonometric polynomial. 


Remark 1. By Corollary ^3 there exists an integer j 0 only depending on (p and ip such 
that, for all j > j 0 , the trigonometric polynomial Hj( A) is not identically zero. In the case 
of a multiresolution analysis, the father and mother wavelets are defined in such a way that 
jo = 0. In the general case, by dilating ip appropriately, or, in other words, by changing 
the reference scale, one can impose jo = 0, which is assumed in the sequel. 

Under assumption |(W-4)| t i —> J2i £Z (p(t + l)l m is a polynomial of degree m and |(W-3j] 
therefore implies that, for all j > 0 and all m = 0,.. ., M — 1, 

/ OO _ 

iP{2~H) J2 <£(* + l ) rdt = 0 • ( 15 ) 

it* ■°° l£ Z 

Now consider Pj(x) = Y2iez d j,i xl and observe that (ITHj) implies Pj( 1) = 0, P-( 1) = 0, ..., 
pjM-i)(i) _ an( j p ence Hj( A) = Pj(e~ lX ) factors as 

Hj(X) = (1 - e~ iX ) M Hj(X) , (16) 

where Hj( A) is also a trigonometric polynomial. The wavelet coefficient (1131) may therefore 
be computed as 


= (t \h, * A M x]) k 


( 17 ) 
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where {hjj}i e % are the coefficients of the trigonometric polynomial Hj and A M x is the 
M -th order difference of the sequence x. In other words, the use of a wavelet and a scaling 
function satisfying |(W-4)| and |(W-~3)| implicitly perform a M -th order differentiation of the 
time-series. Therefore, we may work with a K -th order integrated processes X without 
specific preprocessing, provided that M > K. This is in sharp contrast with Fourier 
methods, where the time series must be explicitly differentiated at least K times and a 


data taper must be app lied on t 


re differenced series to avoid frequency-domain leakage 


(see, for instance. iHurvich et all (2002)). 


4. Spectral Density of the Wavelet Coefficients 


Because the wavelet coefficients at a given scale are obtained by applying time-invariant 
linear filters, computing the covariance of the wavelet coefficients of K -th order stationary 
processes is an easy exercise. The following proposition provides an integral expression for 
the covariance between two wavelet coefficients on possibly different scales, expressed in 
terms of the transfer function Hj of the wavelet filters and t he gene ralized spectral density 
of the process X. This proposition extends Theorem 2 in Masrv 1 1993 1 on the spectral 


measure of the DWT coefficients of increm ent stat ionar y cont inuous-time processes to the 
discrete-time setting and Lemma 1 in Craigmile and Perci val (12005 1 to functions if and </> 
that do not necessarily define a multiresolution analysis. 


Proposition 1. Let X be a K-th order integrated process with generalized spectral density 
f. Assume with M > K. Then, for all j, j' > 0 and k, k! € Z, 

Cov(W$, wy„,) = f «<*“'-**'> /(A) H,( A) HPX) d\, (18) 

J — 7 r 

where the wavelet coefficient W* k is defined in (HDD. 

The proof follows from elementary results on time-invariant linear filtering of covariance 
stationary processes, using m, (ED, applied to A. M X, which is covariance stationary 
with spectral density |1 — e lA | 2 M /(A). 

By ED, for a given scale j, the process {W* k } keZ is covariance stationary. The situ¬ 
ation is more complicated when considering two different scales j 7 ^ j ', because the two- 
dimensional sequence {[IT^., W *with T denoting the transpose, is not stationary 
for j 7 ^ j'. This is a consequence of the pyramidal wavelet scheme, where at scale j , the 



























SPECTRAL DENSITY OF THE WAVELET COEFFICIENTS 


wavelet coefficients are downsampled by a factor which depends on j since i n © 
can be expressed as 

Thus, to obtain a stationary sequence, one should consider the process {[W^ k , W: f 2 j-j' fc ] T }fcez 
for j > j which involves a downsampled subsequence of the coefficients at the finer scale 
f. One can also consider the process {[W* k , W / J ^ 2) -j' fc+ ,,] T }fcez for j > j', which includes 
a translation of the location index of the second component by v. It turns out that the 
most convenient is to merge the processes corresponding to ^ = 0,..., 2 J— — 1 and hence 
to consider the between-scale process 

(19) 

where for any u — 0,1,..., j, 


WY / \ [W.V TTfX TT^X 

VV j,k\ u ) — l VV j-u,2 u ki VV j-u, 2“fc+l) ■ ■ ■ 1 vv j-u,2 u k+2 u - 

is a 2 n -dimensional vector of wavelet coefficients at scale j' = j — u. The vector W f k (u) 
involves all possible translations of the position index 2 u k by v = 0,1,..., 2 U — 1. The 
index u in (HD denotes the scale difference j — j' > 0 between the finest scale j' and the 
coarsest scale j. Observe that W^ fc (0) (u = 0) is the scalar W* k . 

One should view the between-scale process d2D as a pair made up of the scalar process 
{W* k }kez an d the vector process {Wf k (j — j')}kez- We shall now express their cross 
spectral density in terms of the generalized spectral density of X and the transfer function 
of the wavelet filters folded on the interval [—7r, 7r]. By setting j' = j or equivalently u — 0 
we obtain the spectral density of the “within scale” process {W* k } keZ . 

Corollary 2. Define for all 0 < u < j and A € (—7r, n), 

W„(A;/>,./') = 

2J-1 

e«(A + 2/7r ) /( 2_i (A + 21*)) + 2lir)) + 2ln)) , (21) 

1=0 

where for all (el, 




( 20 ) 


e „(0 S 2-' 2 [ 1 ,, 
Cov / (iy5,Wf l ,(«)) = 


—i2~ 


-i(2 u -l)2-“£-|T 

,..., e j 


eW fc - fc ') Di , u (A;/,0,^)(iA. 

r 


Then 
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In other words, 

• for all j > 0, the within-scale process {Wj x k }k£z is covariance stationary with spec¬ 
tral density D y - i0 ( - ; /, 4>, if), 

• for all j > u > 0, the between-scale process {[W* k , W-\(w) T ] T } fceZ is covariance 

stationary with cross spectral density /,</>,?/>). 


Note that D j yU is a 2“-dimensional vector and, in particular, D^o is a scalar. The 2 l 
dimensional vector e u (£) has Euclidean norm |e“(£)| = 1. 

Proof. Let j > u > 0. By (JTHJ), we have, for all k,k' e Z and v — 0,..., 2 U — 1, 

/»27T 

Jo 

The exponential can be factorized as e l2JA ( fc - fc/ ) e -^ 2 “ 2JA _ Hence 


-2 IT 


Co v f (W*, W*,(«)) = / e iA2J e n (2 J A) /(A) H,(X)2 U / 2 H^ U (X) dX 

Jo 

= / e iA(fc - fc,) e u (A) f(2~ j X) 2~ j/2 H j (2~ j X) 2~ {j ~ u)/2 H 3 _ u ( 2^'A) dA. 

Jo 

The result is obtained by folding and shifting the previous integral as follows, setting 

0(A) = e u (A) f(2~ j X) 2~^ 2 Hj(2~ j X) 2~^/ 2 H^ u (2^X), 


x nz 

/ e iX (k-k') ^(A) dA = ^ 

Jo l=0 J21-. 

I e iHk ~ k ' ] g(X + 2ln)dX= [ e iA(fc - fc,) f s(A + 2 /tt) ] dA . 

/_n Jo Jo \ 7_n / 


2^-1 


f2(Z + l)-7T 


yA(fc—fc') ^ A ) 


t ^ f 21 tv 

2 J — 1 7-27T ^2tt 


23-1 


The function in parentheses is (27r)-periodic because J^ 2 ^ 1 g(A + 2(1 + l)7r) = J^ 2 ^ 1 g(A + 
2 / 7 t) + < 7 (A + 2 j (27t)) = X// 2 =o' . 9 (A + 2 / 7 t) since (7 is 2 J (27r)-periodic. Hence J Q 27r can be 
replaced by ff , which gives the result. □ 


We now apply the preceding results to the class of processes with memory parameter 
d e K (see ©)• To obtain error bounds on the variance and the spectral density of the 
wavelet coefficients, some additional assumptions are required on the smoothness of f* at 
zero frequency. 
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Definition 1. For 0 < (3 < 2 and L > 0, define the function class Tt((3,L) as the set of 
even non-negative functions g on [—7r, 7r] such that, for all A G [— ir, 7r], 

| 9 (A)- 9 (0)| <I S (0)|A|A (22) 


This type of assumpti on i s typical in the semi-param etric estimation setting (see for 


instance 


Robinsonl (1199511 and lMoulines and Soulieii (1200 2)). The larger the value of j3, the 
smoother the function at the origin. For g even - as assumed - and infinitely differentiable, 
g'{ 0) = 0 and hence, by a Taylor expansion, (1221) holds with (3 = 2. 

Since, for instance, /arma i n © is infinitely differentiable, it belongs to H( 2, L) for some 
L. As for FBM and FGN, consider /p BM in ((SI). The first term in the RHS is 1 + 0(A 2 ) 
and the second is 0(|A| 2H+1 ). Hence, for some positive constant L, I/fbmK — /fbm(0)I — 
L /fbm(0) |A/ 2H+1 ) a2 , where a A b = min (a, b)\ hence 


fZ BM eH((2H + l)A2,L) 


(23) 


The expressions of the within- and between-scale wavelet coefficient spectral densities 
D f,(j),if) given in Corollary [21 depend both on d and on the function f* and will 
therefore be denoted by D f*,(f>,i/)) in the sequel. We are going to show, however, 
that these quantities may be approximated by quantities which depend only on the memory 
parameter d and /*(0). Let X have a generalized spectral density /(A) = |1 — e lX \~ 2d f*(X) 
and define 

a 2 (d,r) = Var [W*\ = f \l - e~ iX \~ 2d r(\)\Hfi\)\ 2 d\ , (24) 

J —7V 

the variance of the wavelet coefficient of the process X at scale j. 


Theorem 3. Let M >1 be an integer and a, L, (3 be constants such that a>l,0<L<oo 
and (3 G (0,2], Assume that \(W-T)\(W-3\ hold with M and a. 

(a) Let d min and d max be two constants such that 

Klin, Kax] c ((1 + (3)/2 — a, M + 1/2) . (25) 

Then, there exists a constant C > 0 (only depending on the constants (3, d m ; n , d max 
and the functions <f> and fi>) such that, for all j > 0, d G [d min , Kax] and f* G Tt{(3, L), 


- f'(0)K(d,i>)2 2id \ < Cf'(0) L 2 (2d - 0>i 


( 26 ) 
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where K(d, 0) is given by 

/ OO 

\Z\- 2d \m\ 2 dt. (27) 

-OO 

(b) Let d m[n and d max be two constants such that 

[drain , 4ax] C ((1 + (3)/2 ~ (X, M] . (28) 


Then, for all u > 0, there exists C > 0 (only depending on u and on the constants 
(3, d ^ n , rfmax and the functions 0 and if) such that, for all A G (—vr,7r) ; j > u, 
d G [d m m, ^max] and f G 77(0, E), 

ID,- tt (A; d, r, 0 , 0 ) - /* (0) D 00jU (A; d, 0 ) 2 2jd \ < C /* (0) L 2^~^ (29) 


where j • | denotes the Euclidean norm in any dimension and, for all u >0, 

D 00iU (A; d, 0) d = \X + 2ln\~ 2d e u (\ + 2ln) ${\ + 2ln)f((2- u (\ + 2ln)). (30) 

zez 

The function (A ,d) h->• Doo, u (A; d, 0) is (2 tt)- periodic in A and jointly continuous on 
R x [d min , d max ] . When u = 0, D^o is a scalar and 


D OO)0 (A; d, 0) dX = K (d, 0) 0 0 


(31) 


The proof, based on approximating the wavelet filter transfer function, can be found in 
Section 0 In order to shed light on Theorem |21 we conclude this section with a number of 
remarks. 


Remark 2. Theorem 01 states that /*(0) K(d, 0) 2 2jci is a good approximation for Var[W/ 0 ] 
and that for any u > 0, /*(0)D OO!U (A; d, 0)2 2jd is a good L°°( —ir, tt) approximation to the 
spectral density Dj, u (A; d, /*, 0,0). 


Remark 3. Relation m with u = 0 implies EH) since //<?i(A)dA — Jf n g 2 (X)dX 


2tt|| <7i 
than 


< 


^2 1 |oo- Observe, however, that (ESI) is valid under Condition ESI), which is weaker 


Remark 4. Under Condition (1280 for all p > 0, 


1000 , 0 -^, 0 ) 11 /= 


|D OO)d (A;d,0)| p dX 


i/p 


(32) 


is positive, finite and continuous in d G [d min ,d max ]. This follows from joint continuity 

and ED- 
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Remark 5. The spectral density /*(0) D oc>u (A; d, tjj) 2 2jd , A e (— n, i r), is in fact the spectral 
density of the wavelet coefficient of the generalized fractional Brownian motion 5(d), where 
del. The process 5(d) is parameterized by a family 0(d) of smooth “test” functions 9(t), 
t e K. and is defined as follows: {5(^(0), 9 e ©(d)} is a mean zero Gaussian process with 
covariance 

Co v(B {d) (9 1 ),B {d) (9 2 )) = [ |A|- M 0i(A)^(AydA. (33) 

Jr 

The finiteness of the integral f R |A|~ 2d |0(A)| 2 dA provides a constraint on the family 0(d)- 
For instance, when d > 1/2, this condition requires that 0(A) decays sufficiently quickly at 
the origin and, when d < 0, it requires that 0(A) decreases sufficiently rapidly at infinity. 
Hence 9 can be a wavelet ^ if d G (1/2 — a, M + 1/2), which corresponds to (1251) with 
(3 = 0. The discrete wavelet transform of B (d) is defined as 

W^ = B {d) (^ k ), jez, fcez. (34) 


The spectral density /*(0)D OO)U (A; d, i(j)2 2jd in (1221) which serves as an approximation to 
Dj )U (A; d, /*, (j), iji) is in fact, up to the multiplicative constant /*(0), the cross spectral den¬ 
sity between the wavelet coefficients and the vector of wavelet coefficients W^/ (u) c = 


W, 


(d) 


r(d) 


j-u,2 u ki ' ' ' > vv j-u,2 u k+2 u -l 

one has 


Indeed, using (1221) . (1221) and A) = 2^ 2 t\)(2? A)e 


-ik\2* 


Cov ( WW, ) =2 j / \\\- 2d e u {-2 j \)$(2 j \)$(2i- u \) e iX2j ( k '- k U\ 

' ' Jr 

= 2 2d i r D 00 , m (A; d, i(j) e iA ( fc - fc ') d\ , (35) 

J — 7 r 

where the last equality is obtained by the change of variable — 2 J A —> A and by folding the 
integral on (— 71 , 71 ). 

The within- and between-scale spectral densities D JiU (A; d, /*, (j), t/j) of the process X with 
memory parameter d may thus be approximated by the DWT of the generalized FBM 5(d) 
(viewed as a generalized process), with an L°° error bounded by the RHS in (1291) . 


Remark 6. When d belongs to (1/2,3/2), 5(d) is related to 5#(£), t e K, by setting 
H = d — 1/2 e (0,1) and, up to a multiplicative constant, 

5(d)(0) = [ B H (t)9(t)dt, 

Jr 
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where the equality holds in the sense of finite-dimensional distributions and hence {W^, j, k E 
Z} has same distribution as { Bfr(s)'i{jj^(s) ds, j, k E Z}. It follows from the previous 
remark that, for such d and H , the spectral density of the wavelet coefficients of X can be 
approximated by that of continuous-time fractional Brownian motion. 

Remark 7. Once normalized by 2 2jd , which is the order of the variance of the wavelet 
coefficients at scale j (see EH), the difference of the spectral densities in (El is bounded 
by a constant times 2~^\ a factor which tends to zero exponentially fast as j —> oo. The 
rate of the decrease is determined by the smoothness exponent (3 of /*. 

Remark 8. If d — 0 and (0^, fc 6 Z,j 6 Z} is an orthonormal system, then (1331) and (1331) 
imply 

Cov^J, W$ w ) = [ Vh,fc(A) dX = 2n ( V’i.kWVy,*/(*) dt 

J M J M. 

which vanishes if j ^ j’ or k ^ k'. Hence, when the memory parameter d — 0 and the 
wavelets are orthonormal, the wavelet coefficients {W^ k , k E Z} are then asymptotically 
uncorrelated as j —> oo and their asymptotic variance is 2ti. Using El, the corresponding 
cross spectral density is given by 

Doo,«(A; 0,-0) = 0 if it > 0 and D OOi0 (A; 0, -0) = 1, X E (-n, n) . (36) 

Remark 9. To understand the presence of the asymptotic form /*(0)2 2jd D OO;U (A; d, 0) 
in El, start with Dj jU (A; /, 0,0) in EH) , use the 2 J ’(27r)-periodicity to replace ' 
by Yl'iL-v- 1 +i, an d as •? “> oo, approximate / by the spectral density /*(0)|A|~ 2d of 
a//*( 0)5( d ), Hj( A) by its asymptotic approximation 2- 7//2 0(A)0(2 : 'A) in (1731) and Hj_ u ( A) 
by 2 ( ' j ~ u ' > / 2 (j)(X)^{2 j ~ u X), approximate 0( 2~ j (X + 21 n)) by 0(0) = 1 and approximate 

— 1 i oo 

2=_2J-1+1 Z^z=-oo* 

Remark 10. Let us examine how Theorem |3] applies when X(fc) = Bn{k), k E Z, that is, 

X is a discrete-time version of FBM with Hurst index H E (0,1). From (j3J), (J3J) and (1231) . 
we have d = H + 1/2 E (1/2, 3/2) and f* E H((2H + 1) A 2, L) for some constant L. The 
condition on M is then M > H in case |(a)| and M > H + 1/2 in case The condition 
on a is a > (3/2 — H — (1 — H) A 1/2 in both cases, which is satisfied because a > 1 and 
H E (0,1). Theorem |31 can therefore be applied irrespectively of the value of H when 0 is 
a Daubechies wavelet with at least M = 2 vanishing moments. 
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Remark 11. In case of FGN, d = H — 1/2 e (—1/2,1/2) and hence, compared to the 
previous case, M decreases by 1 and a increases by 1. Thus the conditions are M > H — 1 
in case | (a) | and M > H — 1/2 in case |(b)[ that is M — 1 works in either case. The condition 
on a becomes a > (2 — H ) A 3/2, that is a > 3/2 will work for all H 6 (0,1). Since the 
Daubechi es wa velet with M = 2 has an a = 1.3390 (as given by Formula (7.1.23), Page 225 


m 


Daube chi es (1992J) 2 ), the condition on a is satisfied only for H > 0.67. How should one 


proceed in a situation where H e (0,1) is unknown? There are three alternatives: 1) 
Use M > 3 since the condition is satisfied for the Daubechies wavelet with M > 3 (for 
which a > 1.63). 2) Sum the FGN to get a discrete-time version of FBM as above and use 
M > 2. 3) Use M = 2 and apply Theorem El with a smoothness index /3' < (3 instead of /?, 
worsening the bound in 


Remark 12. When d < 0, it is not M but a which should influence the choice of the 
wavelet. The more negative the value of d, the higher the required value of a. Recall that 
a high value of a corresponds to a fast decrease of ?/(£) as |£| —> oo. 


5. Analysis of the memory parameter estimator based on the regression 

OF THE WAVELET VARIANCE 

In this section, we consider a Gaussian process X with memory parameter d and gen¬ 
eralized spectral density /(A) = 1 — e~ lX \~ 2d f*(\). Then, for any K > (d — 1/2), the 
distribution of the K- th order increment process A. K X only depends on d and f*. We 
apply Theorem El to study the wavelet estimator of the memory parameter d, based on 
the regression of the scale spectrum cr|(d, /*) with respect to the scale index j. This is 
reasonable because, for large scale j, log a 2 (d, /*) is approximately an affine function of j 
with slope (2 log2) d (see in Theorem El)- Given n observations AR,... ,X n , cr 2 (d, /*) 
can be estimated by the empirical variance 

Uj-l 

= v 1 E UA) 2 ■ (3D 

k =0 

where for any j, Uj denotes the number of available wavelet coefficients at scale index j , 
namely, from m, 

rij = [2 ~ j (n - T + 1) - T + 1] , 

2 The a in the table on Page 226 is our a minus 1. 


(38) 
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where n is the size of the time series and [x] denotes the integer part of x. An estimator 
of the memory parameter d is then obtained by regressing the logarithm of the empirical 
variance log(of) for a finite number of scale indices j G {Jo, • • •, Jo + ^} where Jo is the 
lower scale and 1 + £ > 2 is the number of scales in the regression. For a sample size equal 
to n, this estimator is well defined for J 0 and £ such that £ > 1 and J 0 + £ < J(n) where 

J(n) = f [log 2 (n - T + 1) - log 2 (T)] (39) 


is the maximal index j such that rij > 1. The regression estimator can be expressed 
formally as 


Jo+i 

w) = log (S|) , 


(40) 


j=Jo 


where the vector w c = [ta 0 , • • •, W(] T satisfies 


i i 

Wi = 0 and 2 log(2) iw{ = 1 . (41) 

2=0 2=0 

One may choose, for example, w corresponding to the weighted least-squares regression 
vector, defined by 


where B = 
and 


1 

0 


1 

1 


w = DB(B T DB)~ 1 b , 

T 

is the so-called design matrix, D is a definite positive matrix 


bS [0 (21og(2)) -1 ] r . 


(42) 


Ordinary least square regression corresponds to the case where D is the identity matrix. 

We now compute a bound of the mean square error and an asymptotic equivalent of the 
variance of d n ( Jo, w) in the usual semi-parametric framework adopted by Robinson and his 
co-authors for studying Fourier estimators. For the wavelet estimator defined above, these 
quantities depend primarily on n and on the scale index Jo, while in the Fourier case, the 
bounds are generally expressed as functions of n and a bandwidth parameter m, equal to 
the number of discrete Fourier frequencies used. To ease comparison, we will express our 
results with respect to n and m, where m is the number of wavelet coefficients appearing 
in J n (J 0 ,w), namely, 

Jo~\~£ 

def 

m = gL n i ■ 
j=Jo 
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Since 2” J = 2~ J °(2—2 ~ £ ), one gets immediately from (IHH1) that \m—n2~ Jo (2—2~ £ )\ < 

2[£ + 1)(T — 1). Thus m — > oo is equivalent to having n2” J ° —> oo, and, when these 
conditions hold, we have 

m(n) ~ n2~ Mn) (2 - 2~ l ) . (43) 


The next result provides a bound to the bias E d n (Jo,w) 


Var 


d n (Jo, w) 


— d and to the variance 


Theorem 4. Assume that \(WG%(W-f\ hold with M > 1, a > 1 and that X is Gaussian. 
Let w be a vector satisfying m for some £ > 1. Let d min , d max be two scalars such that 
d m in < d max and [d m j n , d max ] C ((1 + f3)/2 — a,M], where (3 G (0,2], Then, there exists 
a finite constant C (depending only on w, (3, L, d min? d max , <f> and if) such that for all 
Jo e {o, • • • , J{n) - £}, d G [dmin, dmax], and f* e H(f3,L) with f*(0) > 0 


E 

Var 


d n (J 0 , w) 
d„(J 0 , w 


d 


< C 


m\P _i 
— + m 

n / 

< C { m" 1 + 1 f— > C 
V n 


y—1 


(44) 

(45) 


Remark 13. While the bias term bound contains (m/n)^, the variance bound has an indi¬ 
cator function which is zero for sufficiently small values of m/n, hence is o((m/n) 2f3 ). This 
indicator function cannot be dispensed with. Indeed if we start our estimation at the finest 
scale J 0 — 0, corresponding roughly to m — n, all we can say is that Var(d n ) < C. If, 
however, we start our estimation at a coarse enough scale J 0 , corresponding to m/n < C -1 , 
then Var(d n ) is bounded by Cm^ 1 , which tends to zero as m —► oo. 


By combining (THli and (S3) it is possible to obtain a bound on the mean square error of 
d n (Joi w )- More precisely, there exists a constant C (depending only on M, a, (3, L, d min 
and d max ) such that, for any f* G TC(f3, L), d G [d min , d max ] and Jo G {0,..., J{n) — £}, 


E 


dn(Jo, w) - d 


< c 


m\ 2 P 
n ) 



(46) 


This shows in particular that, for any non-decreasing sequence {Jo(n),n > 0} such that 
m _1 + m/n —> 0, d n (w) = d n (Jo(n), w) is a consistent estimator of d. If the regularity 
exponent [3 is known, it is possible to choose Jo(n) to balance these two terms, that is, set 
(■ m/n) 2f3 x m~ l or equivalently 2 J °( n ) x n 1 ^ 1+2/3 ' ) as n — > oo. If we choose Jo(n) in such a 
















18 


E. MOULINES, F. ROUEFF, AND M.S. TAQQU 


way, 


and ED imply 


lim sup sup 

n—>oo d£[d mi „,d 


min rmax. 


sup n 2 M 1+2 ® E 

f*en(/3,L) 


d n ( w) - d 


< oo 


As shown in 


Giraitis et ah (199/j), n 2/3 ^ 1+2 h) is the minimax rate of convergence for the 


memory parameter d in this semi-parametric setting. Therefore, 
Corollary 5. The wavelet estimator is rate optimal in the minimax sense 
We shall now obtain the asymptotic behavior of Var d n (w) 


as n 


oo. 


Theorem 6. Assume that \(WG]i\(W-4j\ hold with M > 1 , a > 1 and that X is Gaussian. 
Let w be a vector satisfying (ED for some t > 1. Let {J 0 (n),n G N} be a sequence such that 
m —> oo as n —> oo. For any f* G ki(/3, L), where (3 G (0, 2], and d G ((1 + (3)/2 — a, M ], 

d n { w) = (2 - 2~ e ) w 1 V(d,ip)w , 


lim mVar 


where V(d, -0) is the (1 + F) x (1 + F) matrix defined as 


def 


V M (d,0) = 


4 n 2 2d ^2 iAj 


I Z 

| Dooji— j\ (A, d, 0) | dX 


0 <i,j< 


(47) 


(48) 


K(d, 0) 2 

Remark 14. The asymptotic expression of the variance ED is a quadratic form of w defined 
by the matrix V(d, 0), which depends only on d and 0 (see (1481 ) ). The standard theory of 
linear regression shows that, for any £ > 1, the optimal regression vector of length i + 1 is 

w opt (d, 0) = V-\d, 0)5(B T V~ 1 (d, 0)5)“^ 

and the associated limiting variance is (2 — 2 -£ )b 3 (B 1 V” 1 (d, 0)I?)~ 1 b. This optimal 
regression vector cannot be used directly since it depends on d which is unknown , but one 
may apply a two-step procedure using a preliminary estimate of d as in Bardet (120021 ) in 
a similar context. 

Remark 15. When computing confidence intervals in practice, on e som eti mes uses asymp - 
totic variances in ED with V 00,0) instead of V l (d. tJj). see e.g. lAbrv and Veitchl (11998a ). 
The expression V~ 1 (O,0) can be easily obtained if the wavelets are orthonormal. In this 
case, by (1HH1) and ED> f° r ' l 7^ .0 Vjj(0) = 0 and Vjj(0) = 87r 2 2 J /K(0) 2 = 2 j+1 since 
by (1271) . K(0) = f R |0(O| 2 ^£ — 2vr f R |0(t)| 2 dt = 2n. Then (1T7I) becomes 


lim mVar 


d n (w)j = (2 — 2~ e ) 2 w 2 2 j . 


i=o 
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One can reformulate this in terms of nj 0 ~ n2 J ° instead of m. In view of (IH%1) and (El, 
one gets the following simple expression of the asymptotic variance when d = 0: 

l 


lim nj 0 ( n )Var 

n—HDO 



2 5>R- 

3=0 


If we choose m{n ) (or Join)) such that the bias in EH) is asymptotically negligible, then 
we can obtain the asymptotic behavior of the mean square error E (d n ( w) — d'j . In view 
of and El, we need m —> oo and {( m/nY + m 1 } 2 << m 1 , or equivalently 


n2 -Jo(n)(l+2/3) + ra -l 2 Jo(n) Q, n _> QO 


(49) 


Corollary 7. If holds, then for f* G 7d(/3, L ) and d G ((1 + (3)/ 2 — a, M], 

lim n2~ Mn) E (d n { w) - dY = w T V(d,^)^ 

oo \ / 


W . 


This result of course hints at the existence of a central limit theorem for the estimator 
d n ( w). Such a result can be obtained by using a central limit theorem for quadratic forms 
of Gau ssian variables which is established in a companion paper Moulincs, Roueff and 
Taqqu ( 2005 ). 


6. Proof of Theorem 01 

From now on, we denote by C constants possibly depending on u, d m in , d max , /3, f> and 
-0, which may change from line to line and we omit the dependence on and 0 in the 
notations. We assume, without loss of generality that /*(0) = 1. 

Proof of |(a)| In the expression (H£H) of cr 2 (d, /*), j > 0, we will approximate | Hj (A)| 2 
using m- Thus define 

Aj = 2 j j \1- e~ iX \- 2d /*(A) |?(A)0(2W)| 2 dX and R 3 = a 2 (d, /*) - A, . 

By (Hi, we have 

\Rj\ < C2 j ( 1+M ~ a) f \l-e~ iX \~ 2d f*(X)\X\ 2M (l + 2 j \X\)- a ~ M dX . (50) 

J —TV 

We consider Aj and Rj separately starting with Aj. 

Express Aj as 

Aj = 2 J r g(X)\X\~ 2d |0(2 J A)| 2 dA , 


—TV 


(51) 
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where 

j(A) ='/*(A)|J(A)| 2 |A/(1 —e u )| M , Ae(-*-,jr). (52) 

Since 0 is infinitely differentiable by |(W-lj| A i—■> |0(A)| 2 |A/(1 — e lA )| 2d is infinitely differ¬ 
entiable on [—7r, 7r]. Because f* G 7i((3,L) and /*(0) = 1, there exists a constant C such 
that for all A G (—7r,7r), 

|<?(A)-<7(0)| <CL\Xf, (53) 

where g(0) = 1 because 0(0) = 1 by Condition |(W-1)| Moreover, Aj is finite by © since 
g is bounded and M > d — 1/2. We shall now replace the function g( A) by the constant 
g(0) = 1 and extend the interval of integration from [—7r, 7r] to the whole real line in f)51l) . 
Eqs. m and (1531) imply 


Aj- 2? g(0)\\\- M \^X)\ 2 d\ 


< CL 2 J 


|A| /3_2a! |0(2 J A )| 2 dX 


First observe that, after a change of variable, 


2 j / |y|/3 2d ^^X)\ 2 dX < 2 j( - 2d /3) / {| Al^ 2rfmi ^ v lAl^ 2dmax } |0(A)| 2 dA 


In the RHS of this inequality, using the behavior of |0(A)| at infinity and at the origin 
implied by |(W-2)| and [(W-3)| respectively, and because d max < M + 1/2 and d mm > (1 + 
(3 )/2 — a, the integral is a finite constant. We further observe that, by |(W-2)[ since 
d m ^ > 1/2 — a, we may write 


2 j 



\X\~ 2d ${2 j X)\ 2 dX < C 2 j{l ~ 2oi) I |Ar 2( “ +cUn) dA 

J | A | >7T 


which is integrable. Since, by (BSD, i — 2 a < 2d — (3, there exists a constant C such that 


Aj - K(d)2 2jd \ < CL 2 ( - 2d ~ 0)j , 


(54) 


where K(d) is given by (1271) . 

We now compute a bound for Rj using ©• Note that there exists a constant C such 
that, for all A G (—7r, 7 t), 


A 

1 — e lA 


2d 


/(A) = /* (A) 


\X\~ 2d < CL\X\~ 2d . 


(55) 

















SPECTRAL DENSITY OF THE WAVELET COEFFICIENTS 


21 


Plugging this into © and then separating A < 1 and A > 1, we obtain 

R :j < CL2 2jd 2~ j ( M+a) [ |A 2(M_dmiIl) V A 2(M - dmax) } (1 + A)-“- m d\ 

Jo 

< CL2^ 2d ^^2~^ M+a ~^ \j a 2 ( M “ dmax )^A + J A M_2dmin_Q dA|‘. 

Since 2(M — d max ) > —1, the hrst integral is a finite constant. Depending on whether 
M — 2 d min — a is less than, equal to or larger than —1 the second integral is bounded 
by a finite constant, log7r +jlog2 or C2^ 1+AI ~ 2dmi ' [1 ~ 0 ‘\ In the two hrst cases, we simply 
observe that M > 1, a > 1 and /3 < 2 imply M + a — fd > 0, and in the last case that 
— (iff + a — /3) + l + M — 2 d min — a = 1 — 2d min — 2a + (3 < 0 by (PHI) so that, in all cases, 
Rj < CL2 { - 2d ~P'> j . This condition, with (IHH1) . implies 

| a 2 (d, /*) - K(d)2 2jd \ = |Aj + Rj - K(d)2 2jd \ <CL2^ 2d ~^ j 

which proves ©)• 


Proof of |(b)| For ease of notation, we only consider the case u = 0 so that e u (£) = 1. 
It is also enough to suppose j > 1. In m, the summands are 2 J (27r)-periodic; hence, 
omitting the summands, = 2^/= 0 + Z^= 2 >'-i = 2^i=o +l^i=- 2 ^ = 2^i=-v-^ 

Note that, for l E {—2 J_1 ,..., 2 J ~ 1 — 1} and A E (0, i r), we have 2 _J (A + 2ln) E (—7r, tt) 
so that m applies. Hence, D Ji0 (A; d, /*) in (PT1) is expressed as the sum of two functions 
Aj( A) + Rj( A), dehned for all A E (0, ^ r) by 


— 1 — 1 


A,(A) d = 5Z |2AA + 2(n|- M 9 (2AA + 2/n)|AA + 2(ir)|- 


(56) 


Z=-2A- 


where g is dehned in © and where by (I7TJ1) . 

2 i- 1 

Rj(X) < CL2A 2d ~ M ~^ Y, |A + 2/vr| 2(M - ,i) (l +|A + 2 /tt|)- q - m . (57) 

l=- 2J- 1 


From (IHH1) . we get, for all A E (0,7r), 

2 J -1 — 1 

Aj(A) — 2 2dj g(0) Y, |A + 2/7r|^ 2d |-0(A + 2/7 t)| 2 
Z =-2 j - 1 


< CL2^ d ~^Bj(A), (58) 
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where, by © and |(W-2)| for all A G (0,7r), 

2 ^ 1 —1 

Bj( A) = |A + 2/vr| /3 - 2,i |^(A + 2/7r)| 2 

l=—2 3 ~ 1 

<C (\\\0+ 2 ( M ~ d ) + 2 IA + 2 lnf- 2d ~ 2a 

\ i> i 

< C ^1 + 2 ^(2/ _ l)' 3 - 2drm n- 2a ^ < CX) (59) 

since |A + 21tt\ > n(21 — 1), (3 > 0 and, by (I2%1) . one has M > d and f3 — 2 d min — 2 a < — 1. 
By the same arguments, for all A G (0, i r), 

Y] |A + 2/vr|- 2d \f{\ + 2/vr)| 2 < C2 j ^~ 2 ^ +a)) 

| Z|>2-* 1 —1 

is bounded since the exponent is negative. Eqs. m with u — 0, (EED, <?(0) = 1 and the 
above inequalities yield that, for all A G (0,7r), 

\Aj(\) - D OOi0 (A; d) 2 2dj | < C L2^ d ~^ . 


We now turn to bounding Rj( A) using (|57f) . For all A G (0,7r), using |A — 2ln\ >7r(2Z —1) 
and d2Hj) , 


2 3 


Rj (A) < c L2 j{2d - p) 2~ j{M+a - p) 


1 + 


—2d m i n -\-M—a 


1=1 


which can be bounded as in the proof of 
or > —1. 

The joint continuity of (A, d ) t—> D^o 
dominated convergence. 



by considering the cases M — 2d min — a <, = 


(A; d, ijj) on M x [d min , d max ] follows from (IHH1) and 


7. Proofs of Theorem |T] and Theorem E] 

From now on, we denote by J m i n , n m j n , C and C' some positive constants whose values 
may change upon each occurrence and which depend at most on w, /3, L, d min , d max , (j), 
and 4>. We will repeatedly use that, by (HIED, (EHD an d (ED, f° r ^o(n) G {0,..., J(n) — £}, 


nj 0 (n) x n Mn)+e x n2 Jo(n) x m(n), 


(60) 
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where a x b means that there is a constant C > 1 such that a/C < b < C a. Fi¬ 
nally, for any measurable vector-valued function ip on [— tt, +7r] and any p > 0, ||<yc|| p = 

(/:>(A)rdA) 1/p . 

Proposition 8. Let a/ be defined as in (pf/j ) and D 00)1i (A; d, 'if), u > 0, be defined as in 
Under the assumptions of Theorem^ one has, as j —> oo and nj —a oo, 

2~ Mj Uj-u Cov (a],d]_ u ) -> (/*(0)) 2 4vr||D OO)U (-;d,^)||2 , 

uniformly on d E [d min , d max ] and f* E L ). 


Proof. We set /*(0) = 1 without loss of generality. Using (|37l) and (f20l) . we write 


rij — l 2“ —1 


Cov [S],a]_„] = —— V v Cov [w^,iyy ii2 . + „] 


3 3 u k,l= 0 u=0 
m —l 


3 3 u k,l =o 


V |Cov[W0,*.W 3 - 


iW 


Tlj — U 


E(l-S) |Cov[W,-,„,W^(«)]| 2 , 


3 / + 


where, in (EH), we used the fact that if the scalar X and the vector Y = [lj ... Y p 
jointly Gaussian, 

p p 

Y Cov (X 2 , Y 2 ) = 2 Cov 2 (X, Y k ) = 2 |Cov(X, Y)| 2 . 


k =1 


fc=l 


Using the notation M n dehned in m, we have 


( \ 1/2 

Wl-Uj |Cov (Ibyo, W,>(«))| 2 ) =M„,(D 3> (.; </*)), 

tez V V / + / 


(61) 

(62) 

are 


(63) 


since, by CorollaryEl D,,,)-: d, /*) is the cross-spectral density of the vector [tbyo, W, r («)]■ 
Applying Lemma fill- (JH2j) • the relation || • || 2 < V2ti\\ ■ ||oo and Theorem HI (f!?T71) . there is a 
constant C such that 


|^n i (D J - u (.;d,r)) -2 2 i d M nj (D 00tU (-,d))\ < C2^ d ~^ . 
On the other hand, by Lemma fill- (1H31) . we have, as Uj —a oo, 


( 64 ) 
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The convergence in m holds uniformly on d E [d m in , d max ] because of the joint continuity 
of (A, d) i—* Dqo^A; d) stated in Theorem |ol 

The result follows from m-m- □ 

Proof of Theorem Again, we set /*(0) = 1 without loss of generality. The bias E[d n ( J 0 , w)] 
d can be decomposed into two terms as follows 


Jo+£ Jq-\-£ 

w i~Jo E [log(oj)] - d = W i - J ° log [ a i( d > /*)] - d+ 

j=Jo 3 = Jo 

J 0 +e 

w o~Jo ( E [ lo g(^j)] - lo S M^j}] } > (66) 

j=Jo 

where a 2 is the wavelet coefficient empirical variance m and E \?j] = &j(d, /*). 

Using m, the first term on the RHS of ({BED may be rewritten as 


r 9 ( a 2 (d,/*) -K(d)2 2 ^' 

Wj- Jo log [a 2 (d, /*)] ~d=^ w J-Jo lo S ( 1 + - 

3=Jo 3 =Jq \ 


K(d) 2 2 U 


(67) 


By Theorem |3} (I2B1) and using that inf[d miri ,r/ max ] K(d) > 0, there exists a constant C such 
that 

\a%i,n-K(i)2^\ 

K(d) V‘ d 

Using that | log(l + x)| < 2\x\ for x E (—1/2, cx)), there is a J m i n such that, for j > J min , 
the logarithm in the RHS of (RT7I) is bounded by C 2 _/3j , and by (1BU1) . for all Jq > J m in, 


Jo+e 

J2 w j-Jo l °g[tf(d, /*)] — d 

3=Jo 


< C £< C (fff 


3=0 


( 68 ) 


This bound is in fact valid for all J 0 > 0 because cr 2 (d, /*) is bounded away from zero and 
infinity independently of d and f*. Indeed, by (EHl) and since f* E TL(f3, L ) with /*(0) = 1, 
there is a small enough e > 0 only depending on d min , d max , (3 and L such that 


c2e 


C / \Hj(X)\ 2 dX < 


11 — e 


—iAl—2 d 


(l-LlXfUlHjWfdXKoftdJ 


< 


11 — e 


-iAl-2d 


(1 + L|A|^) + |Hj(A)| 2 dA < C' / \X\~ 2 d ™ x \Hj(X)\ 2 dX . (69) 


3 

-2 d u 


Observe that the lower bound in the previous display does not vanish since, as stated in 
Remark |U Hj{ A) is a non-zero trigonometric polynomial for all j > 0 and that the upper 
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bound is finite since, by (HU . Hj( A) = 0(|A| M ). Hence there is a positive constant C such 
that, for all j = { 0,..., J min }, C _1 < cr|(d, /*) < C. 

We now consider the second term in the RHS of the display m- The empirical vari¬ 
ance (EZD is a quadratic form in the wavelet coefficients at \Wj, o,..., Wj tTlj ~ i]- By Corol¬ 
lary^ these have spectral density D Ji0 (-; d, /*), given in (1211) . By Lemma El the spectral 
radius of the covariance matrix Tj(d, /*) of the random vector [Wj t o,..., Wj tHj - 1 ] is bounded 
by the supremum of the spectral density, 


P[L(d,/*)]<2U|D,■,„(•;<!,/* 


(70) 


Applying Proposition l~HH (1HH) with A = n J 1 I n . and T = Tj(d, /*) and using (171)1) . we get 


|E [log(cr|)] - log [E (c?|) ]| <4 jt 2 C ( 1 A nj 1 HWoOrf-DIIL 


rij Var [cr 


(71) 


where C is a universal constant. Now, by Theorem HU- (1291) and by joint continuity of 
Doo,o(A, d), 

2- 2 *||D J -o(-; d, DHoo < C (1 + ||D OOj0 (-; d) |U) < C . 

It follows from Proposition [H] that for j > J min and rij > n min , 

2~ 4dj rij Var (a|) > 2n inf ||D OO)0 (-; c£) ||| 

which is positive by Remark 01 The last two displayed equations imply that for j > J min 
and rij > n min , 


\Di,oMnwi < c ' 


rij Var [cr| 


(72) 


Inserting (El) into El and using PH). we get that for J 0 > J min and nj 0+ i > n m i n , and 
j — J Q ,... ,J 0 +£, 


|E [log(a|)] - log (E [dj] ) | < Crij 1 < Crij 4 +e <Cm 1 < C (m 1 + ( m/n ) /3 ) . (73) 


This last bound holds in fact without the preceding restrictions on J 0 and nj 0+ £. To see this, 
use (I7T1) (with the “bound 1”) and observe that, by (RTTT1) . J 0 < J min implies 2 ~ Jo > 2 _Jmin , 
that is, m/n > ( 7 , and nj 0+ i < n m \„ implies m^ 1 > C. The bounds dSHl and El, inserted 
in PI - yield the bound (TEHl on the bias. 
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We now compute the variance of the estimator d n (Jo, w). By Proposition fTTTl (l%H and 
using dZOD and m as in (HU), 

V (7 (1 \\ V' Co v(af,a] 

Var(d n (J 0 , wj) - ^ Wi-j 0 Wj-j 0 

i,j=Jo 

Jq-\-£ 

< \ W i~Jo w j-Jo 


i,j=Jo 

Jq-\-£ 


'Epf] E[S|] 


Cov (log(S|),log(S|)) 


E [3j>] E [9J] 


<c£ | n,r 3||P ‘-°l ; ‘ i ’- / ' )IIL v n > **>»*->*’f 

< Crij^ < Cm ~ 3 ^ 2 = o(m _1 ) . 


* ^ 113 
oo 


Var 3/2 (a 2 ) 


Var 3/2 (a 2 ) 


(74) 


On the other hand, by Proposition |S] and Theorem n-m, we have, for any u > 0, as 
jo —> oo and n JO —> oo, 

n jo - u Cov(<jj 0 , (Tj 0 - U ) _ n j0 _ u Cov(a] 0 ,al_ u ) 4tt ||D 00jlt (-; <i)||l 


E K] E K-J a f 0 (^ f*)°l-u(d, /*) 


>—2 du 


(KM) 1 


(75) 


uniformly in d G [d m in, d max ] and /* G 7d(/3, L). Applying (JTHj) with j 0 = z Vj and u = |z — j| 
so that Jo - u = z A j, and since inf de[dminj(W] K(d) > 0, for all J 0 > Anin and n Jo+£ > n min , 


Cov [a, 2 , a 2 ] 

2^ w i-Jo w j- - 
i,j=Jo 


j—Jo 2 


<h 2 (d, f*) 


< Crij l o+i < Cm 1 


This bound with (1771) yields (|771) for J 0 > T m in and nj 0+ £ > n min . When J 0 < J m \ ri or 
nj Q +t < n m in, the RHS of (1771) is larger than a positive constant and it suffices to use that, 
by the Minkowski inequality, 


Jo+£ 


Var 1/2 (d n (J 0 , w)) < ^ K-j 0 |Var 1/2 (log(a 2 )) < C , 
j=Jo 

where we applied Proposition 1771- (1K71) and that, by (1(71 . E[<r 2 ] = <r 2 (d, /*) does not vanish. 

□ 


Proof of Theorem 0 By (1771) and (1731) . for z, j = J 0 ,..., Jq + l, 

~ (n2~ (iAi) )- 1 ~ m _1 (2 - 2 ~ l ) 
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Hence, by (ED and ED, 

^ Jo-M? 

( lini o mVa r (d n (J°, w)) = (2 - 2 f ) ^ Wi-j 0 Wj-j 0 2 (i ~ Jo)A(j ~ Jo) d)||| , 

which gives (ED after a change of variables. □ 

Appendix A. Approximation of wavelet filter transfer functions 


Proposition 9. Under there exist positive constants C i; i = 1 ,... ,4 only 

depending on <f> and if, such that, for all j > 0 and A G (— 7r, n), 


\Hj(X) - 2 jf/2 0(A)V’(2-?A)| < C x 2 j ^ 2 ~^\X\ M , 

|J(A)?(2^A)| < C 2 \2 j X\ M (1 + 2 j '|A|) _ “ _ m , 

\Hj(X)\ < C 3 2 j/2 \2 j X\ M (l + 2 j \X\)- a ~ M , 
|hTj(A)| 2 - 2 J ' |J(A)^(2 J 'A)| 2 < C 4 2 j ( 1+m - q ) |A| 2M (1 + 2 J '|A|)- a “ M . 


(76) 

(77) 

(78) 

(79) 


Proof. Under [(W-l)| and |(W-2)[ we have that, for all tel, + 2 /c7r) e 1 ^ A+2fe,r ) is a 

27r-periodic function, integrable on (—7r, 7t) and whose Z-th Fourier coefficients is 

/ 7r /* OO 

J] J(A + 2kn) e it U+ 2k P e -iA« d \= J(A) e iU e" iA ' dX = 2ir cj)(t - l). 

‘ 7r fcGZ ^ 

ft follows that, for all A and t in M, 

J] (f)(t - l) e iM = ?(A + 2/c7t) e a ( A+2fcw ), 

ZGZ fcGZ 

which is a form of the Poisson summation formula. Inserting this in ED gives 

Hj{ A) = 2- J '/ 2 f I ?(A + 2/br) e it(A+2fe,r) ) ^(2" j t) dt 

\fcez / 

/ OO 

e ii(A+2fc 7 r)^( 2 - J f) dt 

Kti ‘°° 


2 J / 2 0(A + 2/c7t) , 0(2 j (A + 2kn)). 


fee z 


From this expression of Hj, we get, for all j > 0 and A G (— 7r, 7 t), 


|ifj(A) - 2 j / 2 0(A)^(27A)| = 2 j/2 


0(A + 2kn)'i/j(2i(X + 2/c7t)) 

l*l>i 


(80) 
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Now using successively (jTJ) and |(W- 2 )[ there is a constant C such that, for all non-zero 
integer k and all A G (— n, tt), |0(A + 2 /c 7 r)j < C\X\ AI and 


\$(2i(\ + 2br))\ < C'(2 J '|A + 2 A: 7 r|)- a < C 2~ aj {2\k\n 


C 2~ aj 

\\\)-a < ° Z _ 

1 l> ~ TT a (2\k\ -!)<*' 


Inserting these bounds into (HOD gives (17(1 . 

The bound (1771) follows from |(W-1)| (10(01 < IZo 10(0 < oo), |(W-2)| (10(O| < 

C( 1 + |0) _Q ) and ©• 

The two last bounds _fp and (|75jl follow from the two first m and dZZD- Indeed, let 
= 2^ 2 0(A)0(2^A). For ® we write 


H,( A) | < |ff J (A)-Hf(A)| + |Hf(A)|. 


Applying (23) and (EH), the RHS of this equation is bounded by 


Ci 2 j( 1/2-a) \\\M + c 2 2 j/ 2 \2 j X\ M (1 + 2 j \X\)~ a ~ M < 

2 j/ 2 \ 2 j X\ M (1 + 2 j \X\)~ a ~ M (C x 2 ~ i(Q+M) (1 + 2 j \X\) a+M + C 2 ). 


By observing that, for all j > 0 and A G (— n, tt), the last term in parentheses is bounded 
by Ci 2 ~A q+m ) (2 1+ %)" +M + C 2 < Ci (27 t)“ +m + C 2 , we get (7%J) . For ((721), we write 


H,( A )| 2 - \Hf> (A)| 2 | < |if,(A) - Hf\ A)| (|//f (A)| + |if,(A)| 


and apply m, EZD and EHD- 


□ 


Corollary 10. Under \(W-l /H/IT-j/l there exists jo > 0 such that, for all j > jo, Hj is not 
identically zero. 

Proof. By |(W-l)| there exist sufficiently small positive constants e and rj such that |0(A)| > 
1/2 for all | A| < e and inf| A |< e -i |0(A)| > r). Hence for all j such that 2 J e > e _1 , using ( 17 ( 1 . 

inf 2~^ 2 \Hj(X)\ > inf |0(A)0(2 J A)| - C x 2~^ sup |A| M > 77 / 2 - C x 2~i a e M , 

Al< e l A l< e |A|<e 


which is positive for j large enough. 


□ 
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Appendix B. Some useful inequalities 


Lemma 11. Let p be a positive integer. For all C p -valued function g G L 2 (—7r,7r) and 
n > 1, define 


M n ( g) 


def 




(81) 


where \ ■ \ denotes the Euclidean norm in any dimension and c k = Jf_ g(A) e lfcA dX. Then, 
for all gi and g 2 in L 2 (— n, n), 


|M n (gi) - M n ( g 2 )| < V2n 



\ 1/2 

g2( A) | 2 dXj 


(82) 


Moreover, for all g in it), as n —> oo, 


M n { g) -»• V2n 



\ 1/2 

|g(A)| 2 dXj 


(83) 


Proof. Suppose p = 1 (the proof for p > 1 is identical). Observe that the RHS in (1KT1) 
is a norm on (ck)k£Z £ / 2 (Z) which is bounded by the l 2 norm l c fc| 2 ) • Thus, 

by Parseval Theorem, M n (g ) is a norm on g G L 2 (—vr, 7r) which is bounded by -\/27r||g||2- 
Hence |M n (gi) - M n (g 2 )\ < M n (g 1 - g 2 ) < v / 2vr||5'i — 92 11 2 • Finally, (1HH1) is obtained by 
dominated convergence. □ 


Denote by Tr(A) and p(A) the trace and the spectral radius of a matrix A. Recall that 
p(A) is the maximum of the modulus of the eigenvalues of A. 


Lemma 12. Let {&, £ G Z} be a stationary process with spectral density g and let T n be 
the covariance matrix of [£ 1; ..., £ n ]. Then, p(T n ) < 27 t ||g||oo- 


Proof. Since F n is a non-negative definite matrix, p(T n ) = sup xgR „ | x | <1 x T r n x, where |x| 
is the Euclidean norm of x. For all xGR”, we may write 


x 1 r,„x = 


9(X) 


E 

i=i 


x £ e 


-a\ 


dX < 


E 

e=i 


x £ e 


-ux 


dX = 27t 


□ 


Proposition 13. Let f is a zero-mean nxl Gaussian vector with covariance T. Then 
there exists a universal constant C independent of n such that for any n x n non-negative 
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symmetric matrices A satisfying Tr(AT) > 0, 

|E (logr^]) - log (E ) I < C (l A ffigg ) ; (84) 

Var (log[£ T A£]) < C . (85) 

Let [£ T ,£ T ] T be a zero-mean (n + n) x 1 Gaussian vector such that Cov(£) = F and 
Cov(£) = T. Then there exists a universal constant C independent of n and h such that 
for any n x n and fix h non-negative symmetric matrices A and A satisfying Tr( AT) > 0 
and Tr(Af) > 0, 


Cov (log[£ T A£],log[£ T i£]) 


Co 


E[£ T A£]E 


< 


C 


p 3 (A)p 3 (r) p\A)p\ f) 1 

Var 3 / 2 (^A£) Var 3 / 2 (£ T if) J 


Proof. Let k be the rank of F and Q be n x k full rank matrix such that QQ 1 = T. Let 
C ~ A/”(0, Ik), where Ik is the identity matrix of size k x k. For any unitary matrix U, 
U( rs-/ A/”(0, Ik) and hence QU( has same distribution as f. Moreover, since A is symmetric, 
so is Q T AQ. We may choose an unitary matrix U such that A l = U T (Q T AQ)U is a diagonal 
matrix with non-negative entries. Furthermore, 


c t ac = (qucvmquo = , 


(87) 


where = denotes the equality of distributions. Since A is diagonal with non-negative 
diagonal entries (A 4 ) 4 = 1 ,. ( T A( is a sum of independent r.v.’s of the form Yli=i 
Since E £ 2 = 1 and Var(<C 2 ) = 2 , we get from (IH7I) that Y^.=\ A* = E [C T A£] = E = 

Tr(Ar) > 0 and Var [£ T A£] = Var [C T A-C] = 2 Y^i =1 A 2 . Now set 

fc 


S = f ^^ diC * 2 with di 


def 


A,: 




2—1 


EL a, ’ 


so that 


EfS] = 1 and Var(S') = 2 ||d|| 


where ||d || 2 = )T^ = 1 <i 2 . The quantities of interest in flBU) and (1%5|) become 


( 88 ) 


(89) 


E (log ) - log [E ({ T ^)] = E (log[5]) 


(90) 
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and 


Var(log [£ t A£]) = Var(log[^]) . 


Since Eli 


A* > El i A?/ maxi<j< fc A* and p{ A) 


M \ 2 = 


Eli A? 

Eli A, 


2 < 


P 2 (A) 

Ell A? 


= maxi<j< fe Ai, we get 


2p 2 (A) ^2 p 2 (Ai)p 2 (r) 

Var(e T AO “ Var(e T AO 


(91) 


(92) 


This is the quantity which appears in ([84)1 . and we will therefore express bounds in terms 
of ||d||. 

Denote by F the distribution function of S, that is F(x) = P(S' < x). Observe that 
F( 0) = 0 since S' is a non-negative weighted sum of independent central chi-squares and 
that all the weights do not vanish. To obtain exponential bounds on F, observe that, by 
standard computations on the chi-square distribution, one has, for t > »(2maxi<j<fc<ij) -1 , 


k 

—tdiC ? 

6 

i=l 

Therefore, for any t > 0 and x > 0, 


E [e"* s ] = ]JE 


k 

JJ(1 + 2dity l/2 . 

i— 1 


k 

log [F(x)j < log [e xt E(e _tS )] — xt — (1/2) ^ log(l + 2 d{t) . 

i= 1 


(93) 


(94) 


Using (El, we derive two bounds for F(x) by choosing t adequately. One bound, which will 
not depend on \\d\\ is for x around 0, the other one, which will improve as ||d|| decreases, 
is for x in (0,1/2). 

To get the first bound, observe that, for t > 0, nli(l + 2 d{t) >1 + 2 1 Eli d t = 1 + 2 1, 
Eli l°g(l + 2 dit) > log(l + 2 1). Plugging this inequality in (jMj) and setting t = 1 /(2m) 
yields 


F(x) < e 1 / 2 


(rE) 1/2 - 



x > 0 . 


(95) 


Let p > 1 and a G M. Since lim 3 ._ >0 + | log(x)| p F(x) = 0, integration by parts and (I3H1) give 
that, 


log(x)| p dF(x)=p / |log(x)| p x x 1 F(x)dx < e 1/2 / |log(x)| p 1 x l ^ 2 dx , 
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which is a finite constant. Since sup x>1 (| log(x)| p /x) is finite and E S = 1, we get that 
I ^ | log(x)| p dF(x) is bounded by a constant only depending on p and thus 

/ I /»oo 

| log(x) | p dF(x) + j | log(x) | p dF(x) < K p , (96) 

where K p is constant only depending on p. This bound proves the left part of the A sign 

in flBD) . 

We now derive a second bound on F(x ) which will yield the right part of the A sign in 
AES) . Since the second derivative of log(l + u ) has absolute value at most 1 for all u > 0, 
we have, by Taylor’s formula, that, for any t > 0, log(l + 2dit) > 2 d{t — 2d 2 t 2 . Applying 
this to m and using YJll=i d« = 1, we get 

log[F(x)] < (x — 1) t + t 2 ||d|| 2 , x>0. 

Setting this time t = ||d|| -1 , we obtain the following exponential bound: 

F(x) < exp [—(1 — x)||d|| _1 + l] < exp [—||d|| -1 /2 + l] , x e (0,1/2) . (97) 

Using the relation a A b < \/~ab, a,b > 0, we can combine m and (EZD to get 

F(x) < x 1//4 exp(—||d|| _1 /4 + 3/4), x G (0,1/2) . 

With this last bound of F at hand, we can improve the boimd established in (19611 as follows. 
Let p > 1. Since | log(x)| p a: 1//4 is bounded on x G (0,1/2) and | log(x)| p_1 x _3,/4 is integrable 


on x G (0,1/2), we have, by integration by parts, 

/ r 1/2 

E (| log(S')| p l{S' < 1/2}) < [| log(x)| p F(x)]p +p | log(x)| p ^ 1 x _1 F(x) dx 

Jo 

< C p exp(—||d|| _1 /4) < C P)Ct ||d||“ , (98) 

where C p and C P)Ct are constants only depending on p and (p, a). For x G (0,1), we have 
| log(x) — (x —1)| < | log(x)|, and for x > 1/2, a Taylor expansion gives | log(x) — (x —1)| < 
2(x — l) 2 , since the second derivative of log(x) has absolute value at most (l/2)~ 2 = 4. 
Hence, for any x > 0, 

| log(x) - (x - 1)1 < I log(x)|l[ 0i i/ 2 ](x) + 2 (x - 1) 2 1[1/2 ,oc)(x) . (99) 
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Since E[S — 1 ] = 0 and E [(S — l) 2 ] = Var(S) = ||d|| 2 Var(</ 2 ) = 2||d|| 2 , using (IHH1) and (IHH1) 
with p — 1 and a = 2, we get 

|E [log(S')]| = |E [log(S') - (S - 1)]| < E [|log(S)| 1{\S\ < 1/2}] + 2E[(S - l) 2 ] 

<C' 1 , 2 ||d|| 2 + 4||d|| 2 . 

Applying (JUD and ED, we get the inequality m with the right part of the A sign. 

The bound © is obtained by applying ED since, by m, Var (l°g[^ T A/]) = Var (log(S)) < 
Elog 2 S. 

We now prove (IKHlh Dehne fc, di and S as we did k , di and S. The LHS in (1HH1) then 
reads 


E 


log(S) log(S) 


-E 


(S — 1)(S — 1) 


E 


(S-l)(log(S)-(S-l)) 


= E 
+ E 


(S-l)(log(S)-(S-l)) 

(log(S) — (S — 1)) (log(S) — (S — 1)) 


We will provide a bound for the first term of the RHS of this display, the other terms being 
treated similarly. By using m and the Cauchy-Schwarz inequality, 


E 


(S-l) log(S)-(S-l) 


< E 


(S-l)|log(S)|l [0 , 1/2] (S) 


+ 2E 


(S — 1)(S — if 


< E|S-1| 2 E |log(S)| 2 l [0 ,i /2] (S) +2 E|S-1| 2 E|S-1 


1/2 


1/2 


In view of (ED, it remains to show that the two last terms are 0(||d|| 3 V ||d|| 3 ). By definition, 
S -1 = Eti di{ C 2 - 1), where {C;}i< fc <fc are i.i.d. standard normal. Therefore, 

k / \ 2 

E|S - !| 4 = cum 4 (Ci) + 3 ( ^ d- ) Var 2 (Ci), 


1=1 


. i=l 


where cum^Z) is the fourth-order cumulant of the random variable Z. Since Ei=i df < 
Eti , we obtain that E|S — 1| 4 < C||d|| 4 for some constant C. Therefore, 

(E|S - 1| 2 ) 1/2 (E|S - 1| 4 ) V2 < C||d||||<i'|| 2 < C(||<i|| s V ||d|| 3 ) . 

Applying dSEJ) with p = 2 and a = 4, we have 


2\l/2 


E 


1/2 


log(S)| 2 l [0il/2] (S) < (V2\\d\\)(C 2A \\d\\ 2 ) < v / 2C' 2>4 (||d|| 3 V||d|| 


(E|S — 1| 2 ) 


for some constant C', which concludes the proof. 


□ 
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